World-wide Covid-19 Data Analyses and Predictions

Author : Saroj Pathak (Spathak@hawaii.edu)
Department of Civil and Environment Engineering
University of Hawaii at Manoa

Introduction:

First detected in the Wuhan city of China in December 2019, 2019 Novel Coronavirus (Covid-19), a respiratory illness causing virus, brought a huge pandemic all over the world within a short period. To this day, millions of people, from across the world, have lost their lives due to infection of the virus, and still thousands more are suffering from it daily. Although the pandemic is still not over, this project aims to analyze and visualize how different countries in the world are affected by it. The project also aims to make time-series based predictive models, for future forecasting, based on confirmed cases and death cases. Data for the analysis purpose are obtained from Kaggle and GitHub. All Covid-19 data are loaded from John Hopkins CSSE data repository. Country code data are also loaded directly from a Github page. Population data are obtained from the kaggle. Coronavirus Image

Objectives:

 To visualize the total confirmed cases and total death toll, across the world, through a map visualization.
 To visualize different statistics like (confirmed/population), (death/confirmed), and (death/population) based on the latest data.
 To find out the top 10 and bottom 10 countries in terms of (confirmed/population), (death/confirmed), and (death/population) cases.
 To make cluster analysis of the countries based on confirmed, (death/confirmed), and (death/population) cases.
 To build time series based predictive models to forecast confirmed cases and death cases in future.

Visualization of Latest Covid-19 Data based on Confirmed Cases

Visualization of Latest Covid-19 Data based on Death Toll

The above visualizations deliver just the raw information. Having more cases in the countries which have more population, compared to the countries which have less population, is normal and we shouldn't compare them based on just the number of cases. So, to have some meaningful comparisons of how different countries are affected by the pandemic, we should normalize the above data by some parameters; a few such visualizations are presented below.

Map Visualization of Percentage of Confirmed Cases Based on Population

Clearly, South American and Europian countries are more affected compared to Asian and African countries.

Now, ratio of death toll to confirmed cases can be something worthy to look as it helps us to compare a death rate of different countries.

Map Visualization of Percentage of Death Cases Based on Confirmed Cases

Sometimes, comparisions made based on confirmed cases doesn't tell a complete story as some countries are not doing enough testings and it no longer be a represenative of how the countries are affected by the virus. Death Toll can be more representative of how the coubtries are affected by the virus as they are less likely to be hidden.

Map Visualization of Percentage of Death Cases Based on Population

Clearly, South American and Europian countries are more affected compared to Asian and African countries.

Time-series Based Clustering For Confirmed Cases

Time-series Based Clustering Based on Death to Confirmed Percentage

Time-series Based Clustering Based on Death to Population Percentage

Time-series Based Predictions

Two predictive model, each for confirmed cases and death cases, are developed based on time series data available. 'ExponentialSmoothing' is used here to train and test the models. All available data before one month, from the day of the latest update, is used for training and the latest one month data is used for testing the model. Upon accessing the Mean Absolute Percentage Error (MAPE), model is utilized for the future forecasting. One month of future forecasting is done using all available data as the training sample.

Time-series Based Prediction for Confirmed Cases

          Model performance for the test data (Mean Absolute Percentage Error (MAPE))   =            1.98 %
                   Total number of confirmed cases (world-wide) after a month   =                    91827228
               Total number of confirmed cases (world-wide) within next one month   =                9119252

Time-series Based Prediction for Death Toll

           Model performance for the test data (Mean Absolute Percentage Error (MAPE)) =             0.97 %
                      Total number of death cases (world-wide) after a month =                       2257442
                  Total number of death cases (world-wide) within next one month =                   452440

Conclusion:

From all the above analyses and visualizations of Covid-19 data, it is clear that the pandemic has brought huge human suffering across the world although the amount of impact is not shared equally. Moreover, if the Covid-19 virus is allowed to spread at the same rate, the predictive model developed at the end forecasts huge human loss within the next month. Let's hope, the ongoing Covid-19 vaccines will be effective to save humanity by developing immunity to the virus.